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The paper demonstrates various machine learning classifiers, they have been 
used for detecting epileptic seizures quickly and accurately through 
electroencephalography (EEG), in real time. Symptoms of epilepsy are 
caused abnormal brain activity. Analyzing and detecting epileptic seizures 
presents many challenges because EEG signals are non-stationary, and the 


patterns of the seizure vary for each patient. Moreover, the EEG signals are 
noisy, and this affect the process of seizure detection. On the other hand, 
Machine learning algorithms are very accurate, adaptive and generalize very 
well when provided with diverse and big training data and can easily analyze 
complex structure of the EEG signal despite the noisiness when compared to 
other methods. With this approach the features of epileptic seizures can be 
learned and used to correctly identify other seizure cases. The demonstration 
states a comparison between various classifiers, including random forests, 
K-nearest neighbors (K-NN), decision trees, support vector machine (SVM), 
logistic regression and naive bayes. Different performance metrics is used 
such as accuracy, receiver operating characteristics (ROC), mean absolute 
error (MAE), root-mean-square error (RMSE) and most importantly 
detection time for each algorithm. The Bonn university dataset has been 
used for demonstration process for the classification of the epileptic seizure. 
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1. INTRODUCTION 

In epilepsy, the entire nervous system is affected, resulting in seizures. The symptoms can vary and 
include loss of consciences and recurrent uncontrollable seizures. These symptoms cause a lot of injuries to 
the patient and in some cases death [1]. A seizure is a paroxysmal change of neurological functions caused by 
the excessive discharge of neurons in the brain. Epileptic seizure is used to differentiate between a seizure 
caused by abnormal neuronal firing caused by a nonepileptic event, such as a psychogenic seizure. Epilepsy 
is the condition of recurrent and unprovoked seizures. Epilepsy has a lot causes, each reflecting underlying 
brain malfunction. Epilepsy syndromes refer to a group of phycological and physiological characteristics that 
always occur together with matching seizure types, age of onset, electroencephalography (EEG), findings, 
triggering factors, genetics, natural history, prognosis, and response to antiepileptic drugs (AEDs). 
Approximately 1% of the population suffers from epilepsy, and almost one-third of patients have refractory 
epilepsy which are seizure that can’t be controlled by antiepileptic medications or other therapies. 
Approximately 75% of epilepsy cases begin in early childhood, reflecting the heigh susceptibility of the 
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developing brain to seizures [2]. so a lot of work was put to appropriately predict seizures earlier and many 
researchers tried to develop EEG-based seizures detection methods but this wasn’t an easy task due to the 
complex nature of the EEG signals that is prone to noise and artifacts from muscle movements and eyes 
blinking [3]. The EEG is a dynamic non-invasive and relatively inexpensive technique used to monitor the 
state of the brain electrical activity in micro-voltages (uv) [4], [5]. Traditionally the early EEG-based seizure 
detection methods consisted of many intermediate complex steps [6]. The traditional methods relied heavily 
on the frequency domain [7], time domain [8] and time-frequency domain analysis [9], as well as 
understanding the nonlinearity of EEG signals [10] and the dynamic changes associated with different 
physiological and psychological state of the body and the brains that introduce noise to EEG recordings [11]. 
But these traditional methods encounter many challenges, first the EEG signals acquisition devices are very 
susceptible to a wide range of noise due to body movement during the recording session and environmental 
noise [12], [13]. Noise can add unwanted distortion and corrupt the features of the EEG signals causing the 
performance of these seizures detection methods to be unreliable, second these methods are very sensitive to 
noise and not robust to discriminate between the seizure features and the noise in the EEG signals, finally all 
the traditional methods are programmed to detect seizures based on several feature extraction and selection 
methods that in many cases remove important features that can be used to detect seizures patterns that are 
similar to healthy signal patterns. On the other hand, machine learning classifiers can overcome the limitation 
of the traditional methods because they are trained on the raw data instead on being programmed explicitly 
[14]. Given enough training data the performance of the machine learning algorithms is much better, more 
robust, generalize well to unseen instances and can detect complex patterns of the seizure signals despite the 
noise and artifacts [15]. The aim of this paper is to implement the latest and most common machine learning 
classifiers and train them on the well-known epileptic seizure dataset of bon university and compare the 
performance of these classifiers to determine the best classifiers [16] that can be used to develop a robust system 
for detecting epileptic seizure using raw EEG signals in real-time. 


2. DATASET DESCRIPTION 

The original dataset that is used in this paper is the well-known benchmark dataset of bon university 
for epileptic seizure detection that consists of 500 EEG signals acquired from different patients, each signal 
represents the brain activity recording for 23.6 seconds, it has been acquired and sampled point per signals 
into 4,097 [10]. Each point in the data EEG signal represents a value of different EEG recording point over 
the time. Each signal in the dataset is segmented into 23 smaller signals preserving their label, and each 
smaller signal contains 178 data points (samples) for 1 second. The dataset now contains 11,500 EEG 
signals, and each signal is a recording for the brain activity for 1 second in micro-volts. Each signal in the 
dataset is labeled with a number that indicate the category to which each signal belongs.X represents the total 
number of EEG signals obtained by segmenting each signal in the data set into 23 signals (11,500 EEG 
signals). Y represents the labels (11,500 label) associated with each signal in EEG signals dataset X. x® 
represents it" EEG signal in the data set and represents a recording of the brain activity for one second. y 
represents the label associated with the x signal in the dataset y® € {1, 2,3, 4,5}. The EEG signals in the 
dataset are categorized into five different classes as mentioned in Table 1. In Figure 1, all categories of EEG 
signals have been explored to be exploited. Figure 1(a) compares between signals from healthy eyes open 
(H.E.O) class and healthy eyes closed (H.E.C) classs, Figure 1(b) comapres between signals from tumor area 
(T.R) and away from the tumor area (H.T.R) classes, and Figure 1(c) compares between all classes of EEG 
signals that are available in the dataset and mentioned in Table 1. 
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Figure 1. Signals comparison graphs: (a) healthy eyes open versus healthy eyes closed, (b) tumor area versus 
away from the tumor area, and (c) all types of signals in the dataset 
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Table 1. Dataset classes description 


Class Description label No. signals 
1 EEG signals recordings of the seizure activity. S.R. 2,300 
2 EEG signals recorded from the location of the tumour. T.R. 2,300 
3 EEG signals recorded from another location of the brain with the tumour. H.T.R. 2,300 
4 Recorded EEG signals; a healthy brain during closed eyes. H.E.C. 2,300 
5 Recorded EEG signals; a healthy brain during opened eyes. H.E.O. 2,300 


3. RESEARCH METHOD 

Prior to classifying epileptic seizures, we need to demonstrate and apply pre-processing techniques. 
The preprocessing steps consist of converting the categorical labels into binary labels for the purpose of 
binary classification (normal versus seizure). The steps of automatic detection and classification process is 
mentioned in Figure 2, the first is to use a raw supervised dataset as mentioned in the previous section. The 
next processing step is to create train set and test set in a stratified manner, the training set is used for the 
classifiers training and tuning, and to find the optimal parameters for each classifier, the test set is used to 
measure the performance of each classifier on unseen data. The next step of the preprocessing is to 
standardize the training and the test sets. After the dataset preprocessing step is finished, the various machine 
learning classifiers are trained and tuned on the training data and then tested on the unseen test set. The 
performance of each classifier mentioned below is evaluated and recorded for the purpose of comparing the 
performance measurements results for each classifier. 
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Figure 2. Classification pipeline 


3.1. Data preprocessing 

There are 5 classes of signals in the dataset, we are only interested in classifying each signal into a 
seizure or non-seizure signal. For this purpose, the classification of each signal will be a binary classification 
task. Each signal with label y € { 2,3, 4,5} will be classified as non-seizure signal and will take the label 
{0}, while the signals with the label y € { 1} will be classified as seizure signal and will be given the label 
{1}. So, after converting the labels into binary labels each signal label is y® € { 0,1} which is either non- 
seizure (9200 signals) or seizure (2,300 signals) as mentioned in Table 2. 


Table 2. Binary classes 


Class Description No. signals Percentage 
0 Non-seizure signals in the dataset 9,200 80% 
1 Seizure signals in the dataset 2,300 20% 


3.1.1. Constructing the train and test sets 

Total number of samples in the dataset is 11,500 and the label associated with each signal is either 0 or 
1. The dataset is now unbalanced; the non-seizure class signals are bigger than the seizure class signals (80%- 
20%) as shown in Table 2. For splitting the data into subsets, each subset must have the same proportion of 
class distribution as the original dataset. The dataset is split into two subsets in a stratified manner preserving the 
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classes distribution. First is Train set being 80% of the original dataset which equals 9,200 EEG signals. Second 
is test set being 20% of the original dataset which equals to 2,300 egg signals [5], [14], [17]. 


3.1.2. Data standardization 

One of the most processes before classification process is dataset standardization, because these 
algorithms have the potential to act in unpredictably negative ways; If the characteristics don't appear like 
ordinary regularly distributed data or gaussian with O mean (u) and unit variance (ø), the classifier's 
performance has been affected [18]. The standardization has been applied to the EEG data for the classification 
process [19]. Because we want to keep the test set away, so it's only used in the performance measurement 
process after each classifier's training process, the mean and variance of the training set are used to perform the 
standardization of both training data and test data. This step prevents any information leakage about the testing 
dataset for each classifier, so that the performance is unbiased and 100% accurate according to the presented 
training data. 


3.2. Machine learning algorithms 

Supervised machine learning algorithms can be described as mathematical functions that map a 
labeled input to an output with some intermediate processing computation called training to extract 
knowledge from the labeled input. The learning process is iterative and requires a lot of training and hyper- 
parameter tuning to find the optimal parameters and the right configuration for each algorithm. With 
sufficient data and appropriate algorithm, the seizure patterns can be identified and detected correctly. The 
classification task will be binary meaning the EEG signal will be classified either to seizure or non-seizure 
case. After the dataset processing step the data is in the suitable form and ready to be fed into any machine 
learning. The classification algorithms are presented as follow. 


3.2.1. KNN classifier 

K-nearest neighbors (KNN), is a non-parametric classification technique that relies on storing the 
dataset features vectors. Dataset in a high dimensional vector space and depends on a measurement function 
of a to calculate and estimate the nearness between the new features vectors and each feature vectors stored 
in the vector space of the algorithm [14], [20], [21]. Then KNN algorithm assign the label of the closest 
(nearest) features vector to the new features vector based on a distance function (D) called Minkowski 
distance. 


D(x, xÒ) = r |x mew) _ xO Pyt/P 


3.2.2. Logistic regression classifier 

Logistic regression algorithm is used mainly for binary classification of dataset that is not linearly 
separable. Logistic regression is a non-linear classifier because it applies an activation function called 
sigmoid activation function (s-shaped) that produces and output between [0, 1]. The logistic regression use a 
custom loss function called binary cross entropy that is based on the maximum-likelihood estimation [22]. 
And by minimizing the loss function we increase the probability (log-likelihood) that a feature vector belongs 
to a certain class [23]. The following equations describe the calculations performed by the logistic regression 
classifier to calculate the output label and optimize and update the weights parameters. 


Vorea = o(X,w® + b) 


LOO, y®) = Sry log(y) + (1 — y)log(1 - y*®) 
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3.2.3. Stochastic gradient descent classifier 
This classifier is an efficient parametric machine learning algorithm used in classification tasks. 
Stochastic gradient descent (SGD) classifier uses a plain stochastic gradient descent technique to iteratively 
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optimize the parameters of the classifier with respect to a convex loss function. The training data is fed to the 
SGD classifier and the classifier adapts its internal parameters to the dataset iteratively using this classifier to 
reach the global minimum of the convex loss function where the error reaches the smallest. The Mean square 
error function is used as a lose function because it is a convex function with a global minimum, and it is very 
sensitive to large errors and outliers in the predicted labels [24]. The following equations describe the 
calculations performed by the stochastic gradient descent classifier. 


yË a = xO.wO +b 


L(y "®©, y®) = MSE (y*®,y) 
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3.2.4. Decision tree classifier 

This classifier based on a non-parametric supervised learning; it is used to create classifiers that look 
like a tree structure. It is easy to understand the decision trees, moreover it is interpreted in simple ways and 
can be visualized to understand the classification process [25]. A decision tree is built top-down using a 
greedy technique, and it consists of a root node which is used to split the features of data points in the data 
and represents the start of the tree classification process. Parent nodes which is a node used to split the 
features into child nodes based on a certain condition. Leaf nodes which are the last nodes in the tree 
structure and used to represent the output of the tree classification process and branches which are the edges 
that connect the different nodes in the tree structure [24]. The tree depth depends on the number of vertical 
nodes in the structure of the tree, and the deeper the tree, the better the model and the more complex the 
decision rules. The decision tree uses the Gini index (GI) to measure the impurity of each node which can be 
interpreted as the quality of the class separation in each node, and a local cost function (J) to evaluate the 
quality of feature split [23]. 


Gl=1- Davy 


Ik = My Gly + My Gly 


3.2.5. Random forests classifier 

This classifier is an ensemble algorithm for classification, and it uses multiple decision trees in 
parallel. Random forest classifier builds decision trees randomly and in parallel, the randomness purpose 
decreases the variance of random forest classifier [25]. because the individual tree in the random forest 
exhibits high variance and overfit the training data. So, this corrects the habit of the decision tree to overfit 
the data and helps the forest classifier to generalize will on unseen data. So, this classifier depends on 
multiple deep decision trees averaging, different parts have been trained of the same training set, the gool of 
the variance is to be reduced [26]. The number of decision trees and the depth of each tree are the main 
parameters to adjust when using the random forest classifier in the forest. The bigger the number of the 
decision tree on the training data and depth of each tree the better the performance of the random forest 
classifier and the more complex the random forest gets [27]. The individual decision tree in the random forest 
uses either the Gini index to measure the impurity of each node and the quality of feature split. 


3.2.6. AdaBoost classifier 

This classifier is short for adaptive boosting, it is an integrated statistical classification technique. 
Weak learners sequence training is the core principle of this classifier, it repeatedly weighted versions of the 
training dataset. The predictions from all weighted versions are combined to produce a weighted majority 
vote, the final prediction depends on the sum of the weighted majority vote. AdaBoost is versatile within the 
sense that boosted weak learners are changed in favor of those occurrences misclassified by past classifiers 
[28], [29]. AdaBoost classifier is less vulnerable to the overfitting issue than other machine learning 
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technique. The individual learners can be frail, but as long as the execution of each one is marginally way better 
than irregular classifier, the ultimate demonstrate can be demonstrated to focalize to a solid learner [20]. 


3.2.7. SVM classifier 

Support vector machine (SVM) is a supervised machine classifier. It constructs set of hyper-planes 
or a hyper-plane in a higher-dimensional space, therefore the induced planes separate training data points for 
classification process into regions [14]. The hyper-plane achieves a good separation boundary, it has in any 
class the maximum margin from the targeted or nearest training data points. The separation is better as margin 
is larger as well as the generalization error is the lower for the classifier [30]. For non-linearly separable datasets 
like Epileptic seizure dataset. SVM classifier uses a higher-dimensional space mapped from the original finite- 
dimensional, the separation process become easy based on the high-dimensional space. The similarity between 
every feature vector and the input sample in the higher-dimensional space is determined by kernel function (K), 
and Radial basis function kernel (RBF) is used in the case of seizure detection. The classification process for 
each input sample is based on a similarity-weighted vote function (F) against every sample (feature vector) in 
the higher-dimensional space to find the most similar feature vector and the label associated with that feature 
vector. 


llx-x®]] 


K(x, x®) = exp(- aF) 
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3.2.8. Naive Bayes classifier 

Naive Bayes classifiers are supervised learning algorithms that are based on Bayes’ theorem. The 
naive bayes classification techniques are applied to the data with the Naive assumption of the conditional 
independence among the features (x4, ..., Xn) of each sample (x) given the label (y) [24]. Bayes’ theorem 
is mathematically stated as the following, 


x)= P(y)P(X4, «Xp |Y) 
es P(X4,.4.Xn) 


Py |x, - 


ĵ = arg max PO) JIi- Pily) 


Given a label (y) and a sample (x) that consists of the dependent features x, through x,. The maximum 
posteriori P(y|x1, ..., Xn) can be estimated by calculating the class prior probability P(y), the likelihood 
P (x1, Xn |y) which is the probability of the sample given the label and the prior probability of the sample 
P(x4,...,Xy). Naive bayes classifiers don’t require a huge amount of data to make accurate predictions and 
they are extremely fast and can be used for real time detection [31]. 


4. RESULTS AND DISCUSSION. 

After training and testing demonstration for the different machine learning classifiers, the training 
on the train set is to find the optimal set of the parameters and hyper parameters for each classifier. The 
performance of each algorithm is then measured with respect to the unseen test set using different 
performance metrics like accuracy, receiver operating characteristics (ROC), mean squared error (MSE), 
root-mean-square error (RMSE) and time to accurately evaluate the performance of each classifier for 
automatic seizure detection in real time. The results for the evaluation metrics have been stated in Table 3. 


Table 3. Classifier’s performance comparison 


Classifier Test Accuracy ROC MAE RMSE Time(s) 
K-NN 95.260% 88.152% 0.047 0.217 0.007 
Logistic Regression 82.000% 55.163% 0.18 0.424 0.002 
SGD classifier 84.26% 66.44% 0.157 0.396 0.001 
Decision Tree 93.913% 87.228% 0.06 0.246 0.004 
Random Forest 97.608 % 96.059% 0.023 0.154 0.0145 
Adaboost 94.521 % 90.054% 0.0547 0.234 0.029 
SVM 97.173 % 94.320% 0.028 0.168 0.0009 
Naive Bayes 95.913 % 93.043% 0.0408 0.202 0.0001 
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The random forest classifier yielded the highest performance with test accuracy of (97.608%) and 
ROC of (96.059%) compared to other classifiers. Followed by support vector machine (SVM) Classifier with 
test accuracy of (97.173%) and ROC of (94.320%). Followed by Naïve Bayes classifier with test accuracy of 
(95.913%) and ROC of (93.043%). Followed by Adaboost classifier with test accuracy of (94.521%) and 
ROC of (90.054%). Depending on these performance results these three classifiers especially Random Forest 
classifier can be used to detect epileptic seizure in real time with high accuracy. 


5. CONCLUSION 

In this paper, we demonstrated the efficiency and accuracy of the machine learning approach for 
automatic detection of seizures based on EEG signals. Compared to other traditional methods used to detect 
seizures, the machine learning approach can learn the high-level features the represent different seizures 
patterns and can effectively discriminate between the seizure and normal EEG signals. Another big 
advantage of machine learning classifiers is that they are very fast and can be used in real time seizure 
detection with very high accuracy and very low latency. Machine learning classifiers also are very robust and 
less sensitive to different types of noise so they can be used in such conditions without the fear of the noise 
effect on the accuracy of the detection. With of these advantages the machine learning approach demonstrates 
its effectiveness overt the traditional time and frequency domain methods. The results of each classifier have 
been tested several times to accurately examine the performance with respect to different performance 
metrics using the experimental Bonn university epileptic seizures dataset. 
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